Bulk http status & 429 retries #1868

gareth-ellis · 2024-08-02T17:40:55Z

Future direction:

need to avoid messy nested fields
service_time should be a sum of all retries
follow approach of other products, such as logstash

gbanasiak · 2024-12-19T09:35:22Z

Context: https://elastic.slack.com/archives/C03P67VQAP2/p1721939793707489

gareth-ellis · 2025-11-18T20:18:27Z

@elasticmachine update branch

gareth-ellis · 2026-01-16T10:35:00Z

My approach so far seems to work - Please take a look

gareth-ellis · 2026-01-16T12:29:04Z

@elasticmachine update branch

gareth-ellis · 2026-01-19T08:24:24Z

The failures are due to me changing the parameters for detailed_stats - https://github.com/elastic/rally-tracks/blob/master/elastic/shared/runners/bulk.py#L36C25-L36C39

I guess I can either
a) implement the retry in a new operation - RetryingBulk?
b) avoid changing parameters and pass via params - though i would then need to avoid returning lines_to_retry -
c) Update elastic/logs, but that would still risk breaking other tracks that external users have written (if they follow the same approach as elastic/logs

thoughts?

gbanasiak

This is an interesting coupling problem with super().detailed_stats(...) used in elastic/logs. Tracks version control is only for ES, not for Rally, so changing custom detailed_stats() say like below will fail when calling parent's method in old Rally.

    def detailed_stats(self, params, response, emit_lines_to_retry=False):
        stats = super().detailed_stats(params, response, emit_lines_to_retry) <--- HERE
        return {**stats, **params["param-source-stats"]}

It would be best if we minimized the surface between tracks code and rally code. Our docs only mention top-level runner call with es and params arguments. Due to this I would be happy to declare that what this particular runner is doing as not supported. I would even go as far as clarify this in documentation.

My vote would be to change elastic/logs custom runner to avoid calling detailed_stats() completely and backport this to all branches where modified Rally might potentially be used, so I'd say all 8.x up until now.

Something like this perhaps?

class RawBulkIndex(BulkIndex):
    async def __call__(self, es, params):
        meta_data = await super().__call__(es, params)
        if params.get("detailed-results", False):
            meta_data.update(params["param-source-stats"])
        return meta_data

The alternative would be to keep detailed_results() as is, and determine documents to retry separately but that means iterating through a response twice which would impact processing time.

I'm curious about @fressi-elastic thoughts on that one as well.

I'm adding further comments below.

I think we also planned on adding exponential back-off. Is this still a plan, or not in this PR?

esrally/driver/runner.py

gbanasiak · 2026-01-20T14:29:43Z

esrally/driver/runner.py

+                break
+            self.logger.warning("Retrying %d documents that previously resulted in a 429.", len(lines_to_retry) / 2)
+            api_kwargs["body"] = lines_to_retry
+            bulk_size = len(lines_to_retry) / 2  # at this point the data always contains action metadata.


at this point the data always contains action metadata

This is a slight off-topic:

I've spent some time digging once I saw this comment. I think bulk runner always receives action-and-metadata lines in the body param today (see here). If corpus does not include them they are generated in earlier processing stages. I don't quite understand this from the above code:

if with_action_metadata: api_kwargs.pop("index", None) # only half of the lines are documents response = await es.bulk(params=bulk_params, **api_kwargs) else: response = await es.bulk(doc_type=params.get("type"), params=bulk_params, **api_kwargs)

The only half of the lines are documents comment suggests the else clause is different, but it isn't. There's nothing in bulk() method of ES client that would magically add action-and-metadata lines. Also doc_type is ignored I think, it's a leftover from old ES versions.

I think we could simplify / remove this.

Yea - it seems at one point we calculated the number of documents here, but that was removed and the comment not removed. I'll remove the comment for now, I think i would need to test a little more whether we have code paths that use the with_action_metadata or not

gbanasiak · 2026-01-20T14:40:26Z

esrally/driver/runner.py

+            if detailed_results:
+                stats = {
+                    "success-count": total_success,
+                    "error-count": total_error,
+                    "retry-count": retry_count,
+                    "took": total_time,
+                    "success": len(lines_to_retry) == 0,
+                    "retried": retry_count > 0,
+                    "bulk-request-size-bytes": sum_bulk_request_size_bytes,
+                    "total-document-size-bytes": sum_total_document_size_bytes,
+                    "ops": {},  # detailed per-op stats are not aggregated over retries
+                    "shards_histogram": [],  # detailed per-shard stats are not aggregated over retries
+                }
+            else:
+                stats = {
+                    "success-count": total_success,
+                    "error-count": total_error,
+                    "retry-count": retry_count,
+                    "took": total_time,
+                    "success": len(lines_to_retry) == 0,
+                    "retried": retry_count > 0,
+                }


This exposes the details of stats at _call__() level which were previously hidden either in simple_stats() or detailed_stats(). Can we avoid this? We could have a method that iterates through response documents, and:

calls another method (passed as an argument) that increases stats counters for each document,

builds a retry list (optionally).

gareth-ellis added 5 commits July 26, 2024 14:06

Add http status to bulk request metrics

c146b01

Check if is ApiResponse first

f4e897d

Add retry

b879ca8

Fix bulk retry on 429

bad2ba2

Extra stats

a1eba72

gareth-ellis mentioned this pull request Jan 21, 2025

Improve retry logic for Bulk Requests #1905

Open

Merge branch 'master' into bulk-http-status

d1c9146

fressi-elastic self-requested a review November 19, 2025 14:51

Improve approach

e6518e4

gareth-ellis requested a review from gbanasiak January 16, 2026 10:34

gareth-ellis marked this pull request as ready for review January 16, 2026 10:34

elasticmachine and others added 2 commits January 16, 2026 13:29

Merge branch 'master' into bulk-http-status

9f6b49b

Fixup tests, retain document size for detailed_stats

2996be1

Remove excessive logging

4f75b96

gbanasiak reviewed Jan 20, 2026

View reviewed changes

gareth-ellis added 2 commits January 21, 2026 13:53

Fix for simple

04879db

Remove misleading comment

c2f1704

gareth-ellis mentioned this pull request Jan 21, 2026

Move extra metadata logic to __call__ elastic/rally-tracks#995

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk http status & 429 retries #1868

Bulk http status & 429 retries #1868

Uh oh!

gareth-ellis commented Aug 2, 2024 •

edited

Loading

Uh oh!

gbanasiak commented Dec 19, 2024

Uh oh!

gareth-ellis commented Nov 18, 2025

Uh oh!

gareth-ellis commented Jan 16, 2026

Uh oh!

gareth-ellis commented Jan 16, 2026

Uh oh!

gareth-ellis commented Jan 19, 2026

Uh oh!

gbanasiak left a comment

Uh oh!

Uh oh!

gbanasiak Jan 20, 2026

Uh oh!

gareth-ellis Jan 21, 2026

Uh oh!

gbanasiak Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bulk http status & 429 retries #1868

Are you sure you want to change the base?

Bulk http status & 429 retries #1868

Uh oh!

Conversation

gareth-ellis commented Aug 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbanasiak commented Dec 19, 2024

Uh oh!

gareth-ellis commented Nov 18, 2025

Uh oh!

gareth-ellis commented Jan 16, 2026

Uh oh!

gareth-ellis commented Jan 16, 2026

Uh oh!

gareth-ellis commented Jan 19, 2026

Uh oh!

gbanasiak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gbanasiak Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gareth-ellis Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gbanasiak Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gareth-ellis commented Aug 2, 2024 •

edited

Loading